-
Notifications
You must be signed in to change notification settings - Fork 483
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] Support downloading dataset from OpenMind #1792
base: main
Are you sure you want to change the base?
Conversation
10c2c82
to
04b1b4a
Compare
@tonysy This PR is ready for review, we are looking forward to receiving any feedback you may have on it, thanks : ) |
@liushz hello, I have noticed that some others friends are also paying attention to the integration progress of this PR. 3 days gone but no progress, can you help me review this? thanks 😄 |
@acylam @zhulinJulia24 please help me review this PR, thanks~ |
cc @MaiziXiao |
By the way, it looks like that the lint error is not releated with my PR ~ |
Opencompass is no supporting downloading from ModelScope, Huggingface and OpenCompass's own dataset downloading service. We do not plan to add more data downloading source especially when it only supports specific dataset like GSM8k |
@MaiziXiao There are total 39 datasets available in modulers community, if all of them are supported in this PR, can you merge it? |
Thanks for your contribution and we appreciate it a lot. The following instructions would make your pull request more healthy and more easily get feedback. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers.
Motivation
Modelers is a popular open-source community that has recently gained a lot of attention. It includes some popular datasets and Ascend-NPU supported models. By using accompanying openMind library, users can train model on Ascend NPU easily.
We hope to integrate the dataset resources of the OpenMind community into opencompass through this PR. After that, we plan to integrate the evaluation capabilities of opencompass into the OpenMind library to facilitate users in conducting model evaluations, and play a role in promoting opencompass at the same time.
By simply setting the environment variable
DATASET_SOURCE=OpenMind
, users can use dataset from openMind community when using opencompass.Modification
This PR aims to establish the process of integrating opencompass with the dataset resources from the OpenMind community, and use the GSM8K dataset as a pilot for this integration. More other datasets will be supported soon.
The modifications are:
opencompass/datasets/gsm8k.py
: Support using openMind library to automatically download GSM8K dataset in OpenMind community when environment variableDATASET_SOURCE=OpenMind
opencompass/utils/datasets.py
: Support getting dataset id from OpenMind community in variableDATASETS_MAPPING
with a new keyom_id
, om is short for OpenMind.opencompass/utils/datasets_info.py
: Add"om_id": "OpenCompass/gsm8k",
intoopencompass/gsm8k
dict, stringOpenCompass/gsm8k
comes from GSM8K dataset in OpenMind community.tests/dataset/test_om_datasets.py
: Add test script for datasets from OpenMind community.BC-breaking (Optional)
Not related.
Use cases (Optional)
We verify this PR in python 3.10.16 on Windows as follows:
Install dependencies and launch:
Running result:
Checklist
Before PR:
After PR: